Biber Redux: Reconsidering Dimensions of Variation in American English

نویسندگان

  • Rebecca J. Passonneau
  • Nancy Ide
  • Songqiao Su
  • Jesse Stuart
چکیده

Genre classification has been found to improve performance in many applications of statistical NLP, including language modeling for spoken language, domain adaptation of statistical parsers, and machine translation. It has also been found to benefit retrieval of spoken or written documents. At its base, however, classification assumes separability. This paper revisits an assumption that genre variation is continuous along multiple dimensions, and an early use of principal component analysis to find these dimensions. Results on a very heterogeneous corpus of post1990s American English reveal four major dimensions, three of which echo those found in prior work and the fourth depending on features not used in the earlier study. The resulting model can provide a basis for more detailed analysis of sub-genres and the relation between genre and situations of language use, as well as a means to predict distributional properties of new genres.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Conversation text types: A multi-dimensional analysis

Multi-dimensional (MD) analysis is a methodological approach that applies multivariate statistical techniques (especially factor analysis and cluster analysis) to the investigation of register variation in a language. The approach was originally developed to analyze the full range of spoken and written registers in a language. Early studies focused on English register variation (Biber 1985, 198...

متن کامل

Book Review: Dimensions of Register Variation: A Cross-Linguistic Comparison

Many readers will be familiar with the quantitative and distributional techniques of register analysis presented by Biber (1988). The present book re-uses this earlier study on variation in British English [henceforth El, along with "a synthesis" (p. xv) of Ph.D. studies done with Biber at the University of Southern California: by Niko Besnier on Nukulaelae Tuvaluan IT], Mohamed Hared on Somali...

متن کامل

Lexical Bundles in English Abstracts of Research Articles Written by Iranian Scholars: Examples from Humanities

This paper investigates a special type of recurrent expressions, lexical bundles, defined as a sequence of three or more words that co-occur frequently in a particular register (Biber et al., 1999). Considering the importance of this group of multi-word sequences in academic prose, this study explores the forms and syntactic structures of three- and four-word bundles in English abstracts writte...

متن کامل

Applying Multi-Dimensional Analysis to a Russian Webcorpus: Searching for Evidence of Genres

The paper presents an application of Multidimensional (MD) analysis initially developed for the analysis of register variation in English (Biber, 1988) to the investigation of a genre diverse corpus, which was built from modern texts of the Russian Web. The analysis is based on the idea that each linguistic feature has different frequencies in different registers, and statistically stable co-oc...

متن کامل

Native and Non-native English Teachers’ Rating Criteria and Variation in the Assessment of L2 Pragmatic Production: The Speech Act of Compliment

Pragmatic assessment and consistency in rating are among the subject matters which are still in need of more profound investigations. The importance of the issue is highlighted when remembering that inconsistency in ratings would surely damage the test fairness issue in assessment and lead to much diversity in ratings. Our principal concern in this study was observing the criteria that American...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014